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Question 1 

a) A 5-figure descriptive summary is the minimum, first quartile, median, second quar- 
tile and maximum. Provide the 5-figure summary for this data set: 

Avet 5°65 45534 bs. 7% S29, Ags ak 

b) If there are 5 dimensions in a datacube, how many cuboids (tables) are there? 

c) If each of the 5 dimensions has 3 distinct values, how many cells (rows of tables) are 
there? 


d) The only base cells with non-zero count in a datacube with 3 dimensions are: 


(al, b1, cl) 
(al, b1, c2) 


What are the other non-zero cells in the datacube? 
(54+2+3+5=15 marks) 


Question 2 is on the next page. 
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Question 2 


a) Items occuring in transaction data sets have associated measures Confidence and 


Support. Define these terms. 


In Table 1, 
1 indicates that the item was present in the transaction 


O indicates that the item was not present in the transaction. 
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b) Construct C1, the count of each item over all transactions. 
c) Construct L1, the items with Support_count >= 4. 

d) Construct C2, the count of 2-item groups, from L1 
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Construct L2, the 2-item groups with Support_count >= 4. 

Construct C3, the count of 3-item groups, from L2. 

) Construct L3, the 3-item groups with Support_count >= 4. 

) Using L3, generate an association rule that has Confidence >= 95%. 
(2+24+1+24+1+2+1+4=15 marks) 


Question 3 is on the next page. 
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Question 3 
Table 2 
record | assignment 1 | assignment 2 
| mark out of 10 | mark out of 100 
tia te sls: [ie as ef aber ee 
A | 1 | 50 
B | 5 | 85 
Cc | is | 45 
D | 2 | 40 


a) Define a suitable difference function to measure the difference between records of 
the database Students( mark1, mark2 ), which is given in Table 2. The function should 
give equal weight to both assignments. 

b) Describe a method of clustering the data of Table 2 into 2 groups. You don’t have 
to do the clustering, just describe the method. 

(5+5=10 marks) 


Question 4 is on the next page. 
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Question 4 


Build a Bayesian Classification model from the following table of six records 


attributes 
a bec od Category 
22 eee eee 
Ri | N Y N N 2 
R2 | N Y Y Y 1 
Reo |e N N N 1 
R4 | N N N N 2 
R5 | Y N N Y 2 
R6 | N Y YY N 2 


a) Calculate P(c), the probability of a category occuring, for all categories c. 

b) Calculate P(ilc), the probability of feature i occuring for all values of attributes i 
and categories c. 

c) Let U=(Y,N,N,Y) be an unclassified record, Calculate P(U|c), the probability of the 
features of U occuring for all categories c. 

d) To what category does the Bayes classifier assign U ? 

(2+4+2+2=10 marks) 

Question 5 

Describe each of these model types and provide an example formula: 

a) linear regression 

b) multiple linear regression 


c) non-linear regression 


(9 marks) 


Question 6 


In the context of web page mining, describe the concepts of authority and hub pages. 
(6 marks) 
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